Elephant Habitat Suitability in Thailand’s Eastern Forest Complex
Comparing natural and human-impact Random Forest models
Author
Isabelle Li, Xiao Yu, Qianmu Zheng
Master of Urban Spatial Analytics, University of Pennsylvania
Overview
Human expansion and landscape fragmentation have rapidly transformed elephant habitats across Southeast Asia. In Thailand’s Eastern Forest Complex (EFCOM), one of the last strongholds for Asian elephants, understanding how environmental and anthropogenic factors jointly shape habitat suitability is essential for conservation planning.
In this project, we develop two Random Forest models—one based solely on ecological variables, and another incorporating human-impact features such as roads and villages. Comparing the two suitability surfaces reveals the magnitude and spatial pattern of habitat loss. These results highlight potential corridor pathways, priority zones for restoration, and areas with elevated human–elephant conflict risks.
Elephants & Study Area
Asian Elephants: Why This Species Matters
Asian elephants (Elephas maximus) are an Endangered species and one of the last remaining large herbivores in Southeast Asia. Over the past century, their populations have declined sharply due to habitat loss, poaching, agricultural expansion, and increasing human–elephant conflict. These conflicts often arise when elephants leave protected forests in search of food, entering plantations or villages. The result is a complex landscape where conservation goals intersect directly with agricultural livelihoods and community safety.
Understanding where elephants prefer to live and how human activities reshape their available habitat is therefore essential for designing conservation strategies, anticipating risk, and reducing conflict.
Study Region
Thailand hosts roughly 4,000 wild elephants, and nearly half live in five major forest regions. Among them, the Eastern Forest Complex (EFCOM) is one of the largest and most ecologically significant. It also experiences some of the highest human–elephant conflict rates in the country, due to expanding plantations, road networks, and scattered settlements around its edges.
The elephants in EFCOM rely on a mosaic of evergreen forest, mixed forest, grassland, and water bodies within protected areas. However, the landscape around these forests is rapidly changing—rubber, palm, and fruit plantations form a patchwork around the protected core, creating narrow ecological corridors and limiting elephant movement.
The map below shows the geographic scope of this project—a section of the Eastern Forest Complex that includes protected forest, agricultural land, villages, water bodies, and transportation infrastructure. This spatial context is important for interpreting both elephant occurrence patterns and model results later in the analysis.
C:\Users\m1870\AppData\Local\Temp\ipykernel_7024\1618835304.py:12: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
study_region.geometry.centroid.y.mean(),
C:\Users\m1870\AppData\Local\Temp\ipykernel_7024\1618835304.py:13: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
study_region.geometry.centroid.x.mean()
Make this Notebook Trusted to load map: File -> Trust Notebook
Study Region Boundary
Data Preparation: Environmental Covariates
To model elephant habitat suitability within the Eastern Forest Complex (EFCOM), we compiled a set of environmental predictors representing both natural habitat conditions and human-influenced landscape features. These covariates capture vegetation quality, land cover patterns, and the proximity of landscape elements that strongly shape elephant movement and space use.
The following subsections describe each covariate and present summary visualizations.
Land Cover Composition
The land cover map below shows the spatial distribution of major habitat types within the Eastern Forest Complex (EFCOM). Evergreen forest (deep green) dominates the interior of the protected area, forming the core ecological zone where most elephant activity is expected to occur. Surrounding the protected forest, large patches of cropland and plantation (yellow and light green) create a fragmented transition zone between forest and human-modified landscapes.
Urban and built-up areas (red) appear in scattered clusters, mostly along road networks and settlement centers, while water bodies (blue) are concentrated in the southern part of the region. This mosaic of forest, agriculture, villages, and infrastructure reflects the complex ecological and social environment in which elephants move.
The land cover distribution shows that tree cover dominates the study region, accounting for over half of all pixels and forming the ecological core of the Eastern Forest Complex. Cropland makes up about 22% of the area, creating a wide agricultural belt around the forest and representing the primary zone of human–elephant interaction and conflict.
Grassland, built-up areas, and bare land occur in smaller patches but play an important role along forest edges and settlement clusters. Water bodies and wetlands represent less than 1% of the region but provide essential resources for elephant movement. Snow/Ice appears due to classification artifacts in the satellite imagery and does not represent actual environmental conditions in Thailand.
Code
class_meaning = {0: "No data",10: "Tree cover",20: "Shrubland",30: "Grassland",40: "Cropland",50: "Built-up",60: "Bare/Sparse vegetation",80: "Snow/Ice",90: "Wetlands",95: "Water"}unique, counts = np.unique(lc_clip, return_counts=True)df_lc = pd.DataFrame({"ClassValue": unique,"Meaning": [class_meaning.get(v, "Unknown") for v in unique],"Count": counts})df_lc["Percentage"] = (df_lc["Count"] / df_lc["Count"].sum() *100).round(2)df_lc
ClassValue
Meaning
Count
Percentage
0
0
No data
292666
0.55
1
10
Tree cover
28505457
53.90
2
20
Shrubland
3019
0.01
3
30
Grassland
4133549
7.82
4
40
Cropland
11810797
22.33
5
50
Built-up
1499059
2.83
6
60
Bare/Sparse vegetation
292243
0.55
7
80
Snow/Ice
5949448
11.25
8
90
Wetlands
91315
0.17
9
95
Water
312213
0.59
Landcover types overview
To understand which environments elephants actually use within EFCOM, we extracted the land cover value at each occurrence point and calculated its percentage distribution. The bar chart below summarizes these results.
The pattern aligns closely with ecological expectations:
- Tree cover dominates (53.9%), confirming that forests remain the core habitat for wild elephants.
- Cropland (22.3%) is the second-largest category, reflecting how elephants frequently travel through or feed near agricultural fields—one of the main drivers of human–elephant conflict.
- Smaller proportions of grassland and snow/ice suggest occasional movement through mixed or transitional landscapes.
- Built-up and water areas account for less than 3%, indicating limited use of heavily disturbed zones.
Overall, this distribution supports the idea that elephants inhabit predominantly forested areas but regularly approach human-modified land uses.
Code
# Sort dataframe by % (largest to smallest)df_plot = df_lc.sort_values("Percentage", ascending=False).copy()# Map colorsdf_plot["Color"] = df_plot["ClassValue"].map(class_colors)# Plotplt.figure(figsize=(11, 6))plt.grid(axis="x", linestyle="--", alpha=0.4)plt.barh(df_plot["Meaning"], df_plot["Percentage"], color=df_plot["Color"])plt.xlabel("Percentage (%)")# Add % text to barsfor i, v inenumerate(df_plot["Percentage"]): plt.text(v +0.3, i, f"{v:.1f}%", va="center")plt.tight_layout()plt.show()
LandCover Types of Elephant Occurrence
NDVI
We assembled NDVI data covering the entire study area to represent vegetation conditions. NDVI provides a direct indication of vegetation health and habitat quality and was incorporated as a key environmental variable to help the model characterize natural resource availability across the landscape.
We collected spatial layers representing major roads, village built-up areas, and water bodies across the study region. Among these, water features represent natural landscape elements, while roads and villages are human-made. To translate these features into variables the model could use, we calculated the shortest distance from each elephant occurrence point to each of these elements. These distance-based measurements allowed us to capture how proximity to natural and anthropogenic features influences elephant movement in a quantifiable way.
Study Region: Eastern Forest Complex and surrounding human features.
Correlation Between Features
The correlation matrix shows that most predictors are not strongly correlated, supporting their joint use in the Random Forest model. NDVI is moderately negatively correlated with distance to water and villages, reflecting how greener areas tend to occur closer to natural and human-modified edges. The strongest correlations appear among the human-related variables: distance to village and distance to water (0.89), and distance to road and distance to village (0.64), which is expected because settlements, roads, and water sources often cluster spatially. Despite these relationships, none of the correlations are high enough to cause concern for Random Forest, confirming that all variables can be meaningfully included without multicollinearity issues.
Code
import seaborn as snsplt.figure(figsize=(6,5))sns.heatmap(df[["NDVI","dist_water","dist_road","dist_village"]].corr(), annot=True, cmap="YlGnBu")plt.title("Correlation Between Features")plt.show()
Correlation Between Features
Model Building
In this section, we developed two sets of Random Forest classification models to understand how environmental and human-modified features influence elephant habitat use within the Eastern Forest Complex (EFCOM). We evaluated two conceptual frameworks:
Natural Model – NDVI, land cover type, and distance to water
Human-Impact Model – adds distance to roads and villages
These two models allow us to directly compare how much additional explanatory power is introduced when incorporating anthropogenic disturbance variables.
Random Forest Method
Random Forest is an ensemble machine learning algorithm that builds multiple decision trees and combines their predictions to produce a final, more stable output. Instead of relying on a single tree—which may overfit or be strongly influenced by noise—Random Forest averages across many trees, each trained on slightly different subsets of the data and features.
Step1 Bootstrap Sampling: Each tree is trained on a randomly sampled subset of the training data (sampled with replacement).
Step2 Random Feature Selection: At each split inside a tree, only a random subset of predictors is considered. This reduces correlation between trees and increases model robustness.
Step3 Aggregation (Ensemble Prediction): For classification tasks (as in this project), each tree votes on the class.The majority vote becomes the final prediction.
Random Forest Diagram
Approach
To model elephant habitat suitability, we trained a Random Forest classifier using presence–background data. The occurrence dataset with 260 points consisted of all available GPS elephant presence points within the study region. To represent the environmental space available to elephants, we generated 4,000 random background points across the landscape, ensuring full coverage of both forested and human-modified areas.
Each point—presence or background—was annotated with key environmental variables (NDVI, land cover, distance to water) as well as human-impact variables (distance to roads and villages).
We then fit two Random Forest models, one using only natural predictors and another incorporating both natural and anthropogenic variables. All models were trained using a stratified 70/30 split to preserve the presence–background ratio, and hyperparameters followed standard defaults to ensure reproducibility. This workflow allowed us to compare how landscape features shape elephant habitat use and to evaluate the influence of human disturbance on predictive performance.
Why Random Forest Is Well-Suited for This Project?
Random Forest is particularly well-suited for this project because elephant habitat selection is driven by complex, nonlinear interactions between natural environmental features and human-modified landscapes. Variables such as NDVI, land cover, distance to water, roads, and villages do not influence elephants in simple linear ways; instead, their effects often depend on thresholds, combinations, and spatial context. Random Forest can naturally capture these nonlinear ecological responses without requiring explicit functional assumptions, making it ideal for modeling habitat suitability.
This combination of flexibility, interpretability, and predictive accuracy makes Random Forest one of the most widely used machine learning approaches in habitat suitability and conservation studies.
Natural model AUC: 0.714
Human-impact model AUC: 0.82
Result
Variable Importance
Model results show that NDVI and land cover are the strongest drivers of suitability, with elephants clearly tied to forest quality. In the human-impact model, distance to village becomes similarly important, indicating how settlement edges shape elephant movement, while water and roads add smaller but meaningful contributions.
Code
# Natural model importanceimp_nat = pd.Series(rf_nat.feature_importances_, index=features_natural)# Human-impact model importanceimp_hum = pd.Series(rf_hum.feature_importances_, index=features_human)df_compare = pd.DataFrame({"Natural": imp_nat,"Human": imp_hum}).fillna(0)df_compare.plot(kind="barh", figsize=(8,6), color=["#1f78b4", "#33a02c"])plt.xlabel("Feature Importance")plt.title("Natural vs Human-impact Model Feature Importance")plt.tight_layout()plt.show()
Natural vs Human-impact Model Feature Importance
ROC curves
Both models perform well, but the human-impact model achieves a higher AUC, showing that human landscape features improve prediction. Its steeper ROC curve indicates a stronger ability to distinguish suitable from unsuitable areas, especially around agricultural and village boundaries.
The natural-only model highlights large, continuous forest interiors as the strongest habitat for elephants, representing the landscape that would exist if ecological factors alone shaped movement. This smooth pattern shows broad areas of intact forest functioning as cohesive habitat.
When human-impact variables are added, the suitability surface becomes sharply fragmented. The prediction clearly shows how roads and settlements break up previously continuous habitat, leaving visible linear barriers across the landscape. At the same time, certain areas show increased suitability in the human-impact model. This occurs partly because cropland—especially cornfields—attracts elephants, and partly because human disturbance compresses elephants into narrower corridors. This compression elevates predicted occurrence probabilities and increases the contrast between high- and low-suitability zones.
The difference map reinforces this pattern. Human activity has the strongest negative impact in the northern and southern forest belts, where suitability drops substantially once roads, villages, and agricultural edges are accounted for. Most of these areas appear red, indicating habitat that would be highly suitable under natural conditions but is now degraded by fragmentation. Only limited blue patches remain, corresponding to agricultural edges where suitability rises due to food availability and forced movement. Overall, the maps illustrate how human land use simultaneously reduces the extent of natural habitat and intensifies elephant presence in a few high-risk interface zones.
Code
base =r"data/"nat_path = base +"suitability_natural.tif"hum_path = base +"suitability_human.tif"diff_path = base +"difference_map.tif"with rasterio.open(nat_path) as src: suit_nat = src.read(1)with rasterio.open(hum_path) as src: suit_hum = src.read(1)with rasterio.open(diff_path) as src: diff = src.read(1)# Suitability Maps of both modelsvmin_suit, vmax_suit =0, 0.25fig, axs = plt.subplots(1, 3, figsize=(20, 7))# 1. Natural suitabilityim1 = axs[0].imshow(suit_nat, vmin=0, vmax=0.25, cmap="viridis")axs[0].set_title("Natural Habitat Suitability", fontsize=13)axs[0].axis("off")cbar1 = plt.colorbar(im1, ax=axs[0], shrink=0.8)cbar1.set_label("Suitability", fontsize=10)# 2. Human-impact suitabilityim2 = axs[1].imshow(suit_hum, vmin=0, vmax=0.25, cmap="viridis")axs[1].set_title("Human-impact Habitat Suitability", fontsize=13)axs[1].axis("off")cbar2 = plt.colorbar(im2, ax=axs[1], shrink=0.8)cbar2.set_label("Suitability", fontsize=10)# 3. Difference mapth =0.02diff_masked = diff.copy()diff_masked[np.abs(diff) < th] =0colors = [ (0.3, 0.6, 1.0), # blue (human > natural) (1.0, 1.0, 1.0), # white (no significant difference) (1.0, 0.3, 0.3) # red (natural > human)]cmap_custom = mcolors.LinearSegmentedColormap.from_list("custom", colors)im3 = axs[2].imshow(diff_masked, cmap=cmap_custom, vmin=-0.3, vmax=0.3)axs[2].set_title("Difference (Natural – Human)", fontsize=13)axs[2].axis("off")cbar3 = plt.colorbar(im3, ax=axs[2], shrink=0.8)cbar3.set_label("Δ Suitability", fontsize=10)plt.tight_layout()plt.show()
Suitability Maps
Conclusion
This project demonstrates how elephant habitat patterns in the Eastern Forest Complex are shaped not only by natural ecological conditions but also by the structure and intensity of human land use. While the natural-only model reveals large, cohesive forest interiors as core habitat, integrating human-impact variables exposes how roads, villages, and agricultural expansion fragment these landscapes. In some edge zones, suitability appears to increase due to crop attractiveness and the compression of elephant movement into narrower corridors—patterns that highlight where conflict risk is likely to rise as natural habitat is lost or subdivided.
Taken together, the results underscore the dual role of conservation and land-use planning in shaping future human–elephant interactions. By revealing where habitat quality is suppressed, where suitability is artificially elevated, and where fragmentation most disrupts movement—particularly in the northern and southern forest belts—the models provide a practical framework for identifying priority areas for protection and conflict mitigation.
Although our models were trained using real-world data from a human-modified landscape, they can still be used to explore more “natural” scenarios. This works because the models capture what elephants prefer, not just where they happen to be found under current constraints. Even if human activities restrict elephants’ movement, their underlying habitat preferences remain the same. Using today’s data to imagine how elephants would use the landscape without human disturbance is therefore reasonable, and it helps illustrate how much the environment—and potential conflict zones—might change if human pressure were reduced.